Yang et at. BMC Systems Biology 201 3, 7:80 
http://www.bionnedcentral.conn/1 752-0509/7/80 



Systems Biology 



RESEARCH ARTICLE Open Access 



Core modular blood and brain biomarkers 
in social defeat mouse model for 
post traumatic stress disorder 

Ruoting Yang^'^'^^ Bernie J Daigle Seid Y Muhie^ Rasha Hammamieh^ Marti Jett^ Linda Petzold^'' 
and Francis J Doyle 1!!^'^' 



Abstract 

Background: Post-traumatic stress disorder (PTSD) is a severe anxiety disorder tliat affects a substantial portion of 
combat veterans and poses serious consequences to long-term health. Consequently, the identification of 
diagnostic and prognostic blood biomarkers for PTSD is of great interest. Previously, we assessed genome-wide 
gene expression of seven brain regions and whole blood in a social defeat mouse model subjected to various 
stress conditions. 

Results: To extract biological insights from these data, we have applied a new computational framework for 
identifying gene modules that are activated in common across blood and various brain regions. Our results, in the 
form of modular gene networks that highlight spatial and temporal biological functions, provide a systems-level 
molecular description of response to social stress. Specifically, the common modules discovered between the brain 
and blood emphasizes molecular transporters in the blood-brain barrier, and the associated genes have significant 
overlaps with known blood signatures for PTSD, major depression, and bipolar disease. Similarly, the common 
modules specific to the brain highlight the components of the social defeat stress response (e.g., fear conditioning 
pathways) in each brain sub-region. 

Conclusions: Many of the brain-specific genes discovered are consistent with previous independent studies of 
PTSD or other mental illnesses. The results from this study further our understanding of the mechanism of stress 
response and contribute to a growing list of diagnostic biomarkers for PTSD. 



Background 

Post-traumatic stress disorder (PTSD) is an anxiety dis- 
order that is triggered after exposure to traumatic events. 
Individuals with PTSD have persistent fear memory and 
often feel emotionally numb. If left untreated, PTSD can 
be life-threatening, as it is often linked with substance 
abuse and severe depression. A study of 289,328 Iraq and 
Afghanistan veterans who were first-time users of Vet- 
erans Affairs (VA) health care between 2002 and 2008 
showed that 22% of veterans were diagnosed with PTSD 
and 17% were diagnosed with depression [1]. Given the 
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predominance of PTSD and its negative consequences to 
long term health, it is very important to identify measur- 
able and quantifiable biological parameters, i.e., bio- 
markers, which can serve as prognostic and diagnostic 
indicators for PTSD. Recent studies have proposed several 
candidate brain gene biomarkers that are associated with 
PTSD [2,3]. Even though PTSD is an illness of the brain, 
taking brain biopsy or spinal fluid is not a viable option 
for diagnosis. Instead, blood can be used as a surrogate for 
brain tissue for the purpose of identifying biomarkers 
[4-8]. Specifically, Rollins et al. recently found over 4,100 
brain transcripts co-expressed in the blood of healthy hu- 
man subjects [9]. Furthermore, it was shown that the 
mRNA levels of certain transcripts in PTSD patients re- 
main changed with respect to controls even 16 years after 
the traumatic event [8,10]. Thus, blood gene expression 
assays are of particular interest for both short-term and 
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long-term diagnosis, prognosis, and treatment of PTSD. 
However, the identification of predictive blood markers re- 
quires the accurate separation of biologically relevant core 
markers from unrelated downstream signals. This task is 
particularly challenging when using surrogate tissues, 
since biological noise from the surrogate is confounded 
with noise from the primary tissue. Fortunately, studies 
performed with model organisms allow the direct assay of 
both surrogate and primary tissues. By characterizing the 
molecular changes present in both tissues simultaneously, 
we can more effectively filter out spurious signals in the 
surrogate. 

We recently used repeated exposures of mice to a 
trained aggressor mouse as a "social defeat" model for 
evaluating PTSD symptoms [11]. This social defeat 
model has often been used to induce anxiety, 
depression-like and avoidance symptoms, which are the 
most prominent psychiatric features of PTSD and com- 
mon co-morbidities. Using a "cage-within-cage resident- 
intruder" protocol (designed to model unpredictable 
threats of daily trauma), we exposed individual subject 
male C57BL/6 J mice to single aggressors for six hours 
daily for 5 or 10 days, and we placed individual control 
subject mice in the same cages but in the absence of any 
other mice. After allowing the subject animals to recover 
for either 1 or 10 days (5 day exposure) or 1 or 42 days 
(10 day exposure), we then collected tissue samples of 
blood and seven brain regions of mice under the differ- 
ent stress conditions and measured gene expression 
levels of these tissues using DNA microarrays. As de- 
scribed in [11], the durations of aggressor exposure were 
chosen to simulate shorter term (5-day) and longer term 
(10-day) stress. The shortest recovery phase duration 
(1 day) was chosen to study the immediate effects of 
stress. The longer of the two recovery phase durations 
for each exposure time were selected based on behav- 
ioral tests conducted throughout the study. These tests 
demonstrated 5 -day exposure defeated mice showed 
signs of recovery around 10 days post-exposure, and 10- 
day exposure defeated mice showed signs of recovery at 
much longer times (up to 42 days post-exposure). Be- 
cause PTSD represents a persistent stress response, it is 
important to identify differentially expressed genes 
(DEGs) active both immediately after the exposure and 
after a long recovery period. Thus, in the current work 
we focus on genes consistently over-/under-expressed 
across all experimental conditions, rather than on DEGs 
from individual conditions (we will address the latter in 
future work). The seven brain regions analyzed in this 
study were chosen due to their known roles in fear 
memory formation, emotion regulation, and decision- 
making — all processes important to the development 
and pathology of PTSD [3]. In particular, the amygdala 
regulates fear memory and emotional aspects; the 



hippocampus is the center for short term memory, and 
the prefrontal cortex controls decision-making. In 
addition, the ventral striatum is strongly associated with 
emotional and motivational aspects of behavior, the stria 
terminalis serves as a major output pathway of the 
amygdala, and the septal area plays a role in reward and 
reinforcement along with the ventral striatum. We note 
that a similar protocol has also recently been used to 
profile social defeat-induced gene expression changes in 
the nucleus accumbens, ventral tegmental area, and 
blood plasma [12-14]. 

The field of systems biology has demonstrated that 
complex diseases such as PTSD are not caused by 
changes in a single gene or pathway. Rather, changes 
occur in a hierarchy of gene modules which collectively 
contribute to disruption of essential cellular functions 
[15-17]. To characterize this module hierarchy, many re- 
searchers have adopted an unsupervised approach 
[17-25] that constructs a network based on gene expres- 
sion data and identifies functional modules based on 
network topology or "guilt-by-association". However, 
these methods usually face the problem of under- 
determination, where the number of interactions to be 
inferred far exceeds the number of independent mea- 
surements [22]. Other studies have adopted a supervised 
network identification approach that begins with a list of 
"seed" genes and gradually expands the list by adding 
interacting genes, ultimately resulting in a compact gene 
module network [26-28]. These supervised approaches 
have shown good performance for classification tasks, 
and we expand upon one of them in this work. 

Previous computational and experimental work sug- 
gests that functional gene modules are highly conserved 
across conditions, tissues, and species [17,29,30]. Direct 
comparisons have been made between multiple mouse 
tissues [21], between human and mouse brains, and be- 
tween human blood and brain tissue [4]. However, mod- 
ules inferred separately from different conditions yield 
partial overlaps at best, which makes drawing compre- 
hensive biological conclusions difficult. Recently, we de- 
veloped a new module identification tool entitled 
COMBINER (COre Module Biomarker Identification 
with Network Exploration) that identifies distinct con- 
served expression modules across various conditions. 
The fundamental idea behind COMBINER is to infer 
candidate modules from data of one condition and valid- 
ate the inferred modules in other conditions using su- 
pervised classification. Those candidate modules that 
perform well in classifying samples from multiple condi- 
tions are then defined as "core modules". There are 
three advantages to this approach: (1) The resulting 
modules are compact and thus exclude unrelated down- 
stream signals; (2) The modules are distinct and well- 
defined with respect to which conditions/tissues/species 
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invoke them; (3) This method provides multiple robust 
discriminative biomarkers co-validated in at least two 
experimental conditions. Given these advantages, we 
have applied a customized version of COMBINER to 
mouse social defeat gene expression data deriving from 
seven brain regions along with blood to identify com- 
mon expression modules. 

In this work, we have attempted to answer two bio- 
logical questions: 1. Which expression modules act in com- 
mon between blood and brain tissue of the social defeat 
mouse model? 2, Which modules act in common between 
different brain regions? To do so, we first performed a pair 
wise comparison of differential gene expression, biological 
pathways, and GO terms between tissues. We then applied 
a new version of COMBINER which we modified in two 
ways (discussed below). First, we used linear models to 
deconvolve time-dependent effects on gene expression 
from effects due to social defeat. Second, we developed an 
improved consensus feature elimination method to iden- 
tify robust modules from data with a relatively small sam- 
ple size. Our results, in the form of blood-brain and brain- 
brain social defeat core module networks, provide a con- 
cise biological description of social defeat and generate 
many candidate PTSD biomarkers for future study. 



Results and discussion 

Overlaps of DEGs/DEGOs/DEPaths 

First, we identified differentially expressed genes (DEGs) 
in each individual tissue across all time points using a 
limma moderated t-test [31]. The numbers of significant 
DEGs (p < 0.05) for each tissue are listed in blue on the di- 
agonal in Figure la. We then established the significance 
of DEG overlaps by computing a hypergeometric p-value 
for each pairwise tissue combination (listed in the off- 
diagonal). Hypergeometric p-values < 0.05 are considered 
significant (cells highlighted in red in Figure la). 

Next, we identified differentially expressed Biological 
Process GO terms (DEGOs) in each tissue by first ranking 
all genes in descending order of limma significance and 
then performing Iterative Group Analysis (iGA) [32] for 
each GO term with < 100 constituent genes. We com- 
puted p-values for each terms iGA score using a null dis- 
tribution obtained via 1000 random permutations of the 
original gene order. The numbers of significant DEGOs 
(p < 0.05) for each tissue are listed in blue on the diagonal 
in Figure lb. We established the significance of DEGO 
overlaps in the same manner as in Figure la. 

Finally, we identified differentially expressed MSigDB 
[33] (www.broadinstitute.org/gsea/msigdb/) canonical 
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Figure 1 Overlap analysis between blood and brain regions. Numbers of significant DEGs (a), DEGOs (b), and DEPATHs (c) are listed in blue 


on the diagonal, while hyper geometric p-values are listed in the off-diagonal. P-values < 0.05 are considered significant (cells highlighted in red). 


We consider the DEPATH overlap between hippocampus and stria terminalis to be marginally significant (red font), as it has a p-value < 0.1 and is 


supported by a highly significant DEG overlap between the same tissues. (HB: hemibrain (hemisphere), AY: amygdala, HS: hippocampus, MPFC: 


medial prefrontal cortex, VS: ventral striatum, SE: septal region and ST: stria terminalis). 
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sub-pathways (DEPATHs) in the same manner as DEGOs 
with the following modification. For each pathway, we 
performed iGA separately for all ordered sub-pathways 
ranging in size from three to 10 genes (when genes are or- 
dered in terms of limma significance). We selected the 
highest scoring sub-pathway and established significance 
as before by repeating the procedure on 1000 random 
gene order permutations. The numbers of significant 
DEPATHs and significant DEPATH overlaps are denoted 
in the same manner as above. 

The overlaps of particular interest include amygdala- 
hippocampus (AY-HC) and hippocampus-stria terminalis 
(HC-ST), as these two scored significantly in the DEG 
comparison and significantly or nearly significantly, re- 
spectively, in the DEPATH comparison. These DEPATHs 
describe processes such as inflammation, diabetes, apop- 
tosis, and immune response. Tables 1 and 2 show the sig- 
nificantly overlapping DEPATHs of AY-HC and HC-ST, 
respectively. We list the original name of each sub- 
pathway along with the following information from the 
iGA sub-pathway analysis conducted on the hippocampus 
data: number of genes in the highest scoring sub-pathway 
(Sig. Genes), sub-pathway permutation p-value, and 
Benjamini-Hochberg corrected sub-pathway false discov- 
ery rate (FDR). We note that none of these pathways 
would have been identified as significant from the hippo- 
campus data alone when using a FDR < 0.05 cut-off. We 
also note significant overlaps in the blood-septal region 
and blood-Hemibrain comparisons, where DEGOs related 
to apoptosis and DEPATHs related to insulin/diabetes, re- 
spectively, were identified. Additional file 1: Table SI and 
Additional file 2: Table S2 contains detailed lists of these 
DEGOs and DEPATHs, respectively. 

Core module network 

Although the differential expression overlap analysis 
provided some biological insight into the pairwise mo- 
lecular similarities between mouse tissues during social 
defeat, overlap results between DEGs, DEGOs, and 
DEPATHs were not always consistent. Overlap analysis 
between multiple tissues is more desired, whfle these 
overlaps are very Hmited due to the high noise-to-signal 
ratio of microarray. In addition, it was not obvious how 
best to combine the results into an overaU biological 

Table 1 Significantly overlapping DEPATHs between 



amygdala and hippocampus 



Name 


Number of 
significant genes 


p-value 


FDR 


BioCarta cytokine pathway 
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<0.002 
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FDR denotes false discovery rate. 



Table 2 Significantly overlapping DEPATHs between 
hippocampus and stria terminalis 



Name 


Number of 
significant genes 


p-value 


FDR 


SA caspase cascade 


6 


0.005 


0.400 


BioCarta ILl R pathway 


10 


0.010 


0.440 


KEGG cytosolic DNA sensing pathway 


9 


0.021 


0.499 


KEGG graft versus host disease 
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0.029 


0.562 


KEGG prostate cancer 


9 


0.030 


0.562 


BioCarta keratinocyte pathway 


9 


0.046 


0.636 



description of mouse social defeat. Thus, we turned to a 
network-level analysis to provide deeper insight. Because 
the desired diagnostic biomarkers should be generally 
over-expressed in both the stress treatment and recovery 
period, we extended the COMBINER method [28] to 
accommodate all four conditions, which resulted in 
multiple-time-segment data. However, we would expect 
an age effect in the control mice. For example, the gene 
expression patterns of control mice in the 10-day treat- 
ment 1-day recovery and 10-day treatment 42-day recov- 
ery groups were significantly different due to mouse age. 
Thus, we used the limma software [31] to deconvolve 
the undesired effects of differing mouse ages as explana- 
tory variables in a linear model, and we subtracted these 
variables from the original gene expression values. We 
then applied COMBINER to the "time standardized" 
data to construct a blood-brain network (common mod- 
ules co-expressed in blood and seven brain regions. 
Figure 2) and a brain-brain network (common modules 
co-expressed in six brain regions. Figure 3). 

Blood-brain network 

We first investigated the expression modules active in 
both blood and multiple brain regions. Starting with the 
top 100 candidate modules (when ranked by pathway ac- 
tivity absolute t-score— see Methods) inferred from 
blood sample data, we identified modules that were also 
active in each brain region. To do so, we removed fea- 
tures using Consensus Feature Elimination until the 
average classification Area Under the ROC Curve (AUC) 
evaluated on each brain region exceeded 0.75 (see [28] 
for additional details). After repeating this procedure 
separately for all brain regions, a total of nine core mod- 
ules remained. Figure 2a presents each modules brain 
region-specific expression patterns. We used average 
time curves (see Methods) to show the time-specific 
expression pattern of the modules as heat maps in 
Figure 2a. Figure 2b further shows the expression of the 
core modules and the protein-protein interactions (PPIs) 
between their gene products. The color of each gene 
denotes its expression level in the blood. Blue lines 
denote known PPIs within modules, while gray lines 
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Figure 2 Blood-brain network, (a) nine expression modules resulted from consensus feature elimination; their brain-specific expression 
locations are indicated with numbered blue circles. Time-specific blood expression patterns of each module are displayed using average time 
curves in the form of expression panels, (b) the blood expression level of each gene in the nine modules is indicated with a colored circle. 
Known protein-protein interactions (PPIs) are marked by lines connecting genes — blue lines denote within-module interactions, while gray lines 
denote between-module interactions, (c) the putative biological functions of the expression modules are listed (as inferred using the KEGG 
annotation). (HB: hemibrain (hemisphere), AY: amygdala, HS: hippocampus, MPFC: medial prefrontal cortex, VS: ventral striatum, SE: septal region 
and ST: stria terminalis; 5D-1D/10D: 5 day treatment, 1 day/10 day recovery, lOD-lD/ 6 W: 10 day treatment, 1 day/6 week recovery). 



denote known PPIs between modules. Figure 2c lists the 
putative biological functions of the core modules; de- 
tailed module information is summarized in Additional 
file 3: Table S3. We note that use of COMBINER 
resulted in seven discriminative blood biomarker sets 
(average 0.81 mean AUG and 0.26 mean error rate) 
which have each been validated using data from one of 
the brain regions. Table 3 lists the final number of mod- 
ules identified from each blood-brain region pair with 
the associated mean AUG and mean error rate. 

The resulting nine core modules represent biological 
functions related to molecular transport, integrin and 
tight junction function, retinol metabolism, cell cycle. 



and mRNA transcription. Although initially inferred 
from blood tissue, most of these processes have been 
previously implicated in normal and pathological brain 
function. For example, tight junctions and ABG efflux 
transporters are present in the blood-brain barrier (EBB) 
and the blood-cerebrospinal fluid barrier (BGSFB) 
[34,35], and SLG transporters encode facilitated trans- 
porters and ion-coupled secondary active transporters 
such as neurotransmitters. The latter also represent the 
major class of transporters used in the delivery of drugs 
to the brain [36]. In addition, overexpressed integrin 
genes lead to vascular remodeling, which is believed to 
be highly correlated with mild Traumatic Brain Injury 
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(See figure on previous page). 

Figure 3 Brain-brain networlc. (a) application of COMBINER to brain data yields thirty-seven core modules. The tissue- and time-specific 
expression patterns of each module are presented in the same manner as before, (b) the expression levels and known PPIs of the core module 
genes are displayed. The shape of a gene represents its inference region, and the color denotes its expression level in that region. Blue lines 
denote known within-module protein-protein interactions (PPIs), while gray lines denote between-module PPIs. (HB: hemibrain (hemisphere), AY: 
amygdala, HS: hippocampus, MPFC: medial prefrontal cortex, VS: ventral striatum, SE: septal region and ST: stria terminalis; 5D-1D/10D: 5 day 
treatment, 1 day/10 day recovery, lOD-lD/ 6 W: 10 day treatment, 1 day/6 week recovery). 



(mTBI) [37], a disease related to PTSD. Finally, retinoids 
are important for the maintenance of the nervous sys- 
tem and may play a role in Alzheimer's disease [38]. 

The resulting 43 core genes also exhibit ample evidence 
for association with brain function and/or PTSD. In par- 
ticular, the genes Abca4, Fech Magoh, Ppplrl2b, and Uros 
were previously shown to be differentially expressed in a 
human PTSD signature discovered by Segman et al [8]. 
Seven of the 43 genes closely resemble genes from a blood 
signature for depression {Ahsp, Dhrs9, Map2k2, Slcl3a2, 
Slcl6al Slc39a3, U2afl) [39,40], while Hmbs, Pafahlbl, 
Sjrs2y and Yesl were previously identified as bipolar dis- 
order blood markers [41]. In addition, Ugt2bS and Slc6a9 
are also present in a blood signature for brain injury [42], 
while Dbk Itgbl, Ltc4s, and Rhoa were reported to be rele- 
vant to mTBI [43]. Many of the other genes have been 
associated with various mental illnesses and neurodege- 
nerative diseases, including Schizophrenia, Alzheimer's 
disease, and sleep disorder. Detailed associations and refer- 
ences are listed in Additional file 4: Table S5. 

Brain-brain network 

In a similar manner as before, we first used COMBINER 
to infer the top 100 candidate modules for each brain re- 
gion. We then identified common modules for each 
remaining brain region separately, removing features 
using Consensus Feature Elimination until the average 
AUC of the second region exceeded 0.75. Table 4 lists 
the final number of modules identified from each brain 
region pair, as well as the number of "core" modules and 
"core" genes for each brain tissue (i.e. those present in 
the majority of pair wise comparisons). In total, 37 core 
modules with 177 genes were identified in the brain- 
brain network. 



We list the final number of modules identified from 
each brain region pair, as well as the overall numbers of 
core modules and core genes for each region. Figure 3a 
displays the tissue- and time-specific expression patterns 
of each brain-brain core module. Figure 3b shows the 
expression levels of the genes in each module, as well as 
the known PPIs occurring between genes. Unlike the 
blood-brain network, the shape of a gene represents the 
brain region in which it was inferred. Table 5 provides 
the putative biological functions of the core modules as 
inferred, while detailed module information is summa- 
rized in Additional file 5: Table S4. 

In the brain-brain core module network. Modules 6, 8, 
33, and 15 are of particular interest. An active Module 6 
{Creb3l2, Prkx, Avp) in the hippocampus indicates a 
down regulated PKA-CREB long term potentiation path- 
way, which has been shown to impair memory [44]. In 
addition, the activity of Module 8 {Prkalb, Hspala, 
Nfl<bia, Jun, Cptlb) in the septal region shows down 
regulation of a heat shock protein (HSPAIA). Such ac- 
tivity has previously been found in other PTSD studies 
[45]. Module 33 depicts an up regulated dopamine path- 
way in the ventral striatum. This activity could potentially 
send excessive dopamine to the amygdala and other brain 
regions, which has been shown to lead to increased 
anxiety [46,47]. Finally, Module 15 implies an active pro- 
inflammatory response in the medial prefrontal cortex 
(MPFC) that agrees with the study in [48]. Other validated 
findings include olfactory impairment in the stria 
terminalis (ST) (module 32) [49]; alteration of comple- 
ment pathways in the MPFC (module 20) [50] and acti- 
vated coagulation function in the ST (module 31) [51]. 

The above findings highlight that while the putative 
biological functions of the brain-brain core modules 



Table 3 Identified final modules between blood and brain regions with the associated mean AUC and mean error rate 
of both tissues 
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Mean AUC 


0.73 


0.85 
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0.86 


0.76 


0.82 


0.84 






Mean error rate 


0.24 


0.24 


0.29 


0.22 


0.31 


0.27 


0.24 







The mean AUC and error rate were calculated by 500 LDA classifiers with random sampling on both tissues. {HB hemibrain (hemisphere), AY amygdala, HS 
hippocampus, MPFC medial prefrontal cortex, VS ventral striatum, SE septal region and 57 stria terminalis). 
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Table 4 Numbers of COMBINER modules Identified using 
data from six brain regions 



Val 
Inf 


AY 


HC 


ST 


MPFC 


SE 


VS 


Core 
module 


Core 
gene 


AY 


/ 


17 


25 


23 


0 


16 


1 


4 


HC 


17 


/ 


7 


25 


7 


18 


6 


28 


ST 


22 


29 


/ 


23 


7 


22 


9 


45 


MPFC 


22 


31 


22 


/ 


5 


22 


9 


41 


SE 


18 


26 


25 


17 


/ 


28 


10 


53 


VS 


22 


30 


13 


5 


5 


/ 


2 


6 



{HB Hemibrain (Hemisphere), /A/ amygdala, HS hippocampus, MPFC medial 
prefrontal cortex, VS ventral striatum, SE septal region and 57 stria terminalis). 



largely encompass the DEPATHs identified in the statis- 
tical overlap analysis (Tables 1 and 2), the COMBINER 
network-based analysis provides a much richer molecu- 
lar description of mouse responses to social defeat. With 
additional validation in human studies, we expect these 
findings to yield robust prognostic and diagnostic bio- 
markers for PTSD. 

Conclusions 

The identification of diagnostic and prognostic blood 
biomarkers for PTSD currently is of great interest. In 
this work, we have improved the COMBINER method— 
a computational framework for identifying gene expres- 



sion modules that are activated in common across ex- 
perimental conditions— and applied it to blood and brain 
data from a mouse social defeat model. The resulting 
gene networks highlight stress-related biological pro- 
cesses active in both brain and blood and provide a com- 
prehensive molecular description of social defeat. In 
total, our approach identified seven blood biomarker sets 
that have each been validated for classification perform- 
ance in one brain sub-region. Some of the genes and 
processes discovered are consistent with previous inde- 
pendent studies of PTSD or other mental illnesses, while 
others represent novel candidate PTSD biomarkers. We 
note that the same approach can be readily applied to 
other disease models to construct gene networks that 
are activated in common across tissues; future work will 
focus on this task. 

Methods 

Blood, organ and tissue collection 

Terminal organs, brain regions, and blood samples from 
subject and control C57BL/6 mice were collected after 
24 hours, or 6 weeks (42 days) post 10-day social stress, 
and 24 hours or 1.5 weeks post 5-day social stress. 
Brains of C57BL/6 mice were carefully removed from 
the skulls, and left or right hemi-brain from each 
defeated or control mouse was dissected into different 
anatomical and functional regions: Hemibrain (Hemi- 
sphere) (HB), amygdala (AY), hippocampus (HS), medial 



Table 5 The putative biological functions of the core modules in brain-brain network 



Pathway 


Module 


Pathway 


Module 


Steroid 


1 


Complement and coagulation 


20 


Proteolysis 


2 


Neurotrophin signaling 


21 


Notch signaling 


3 


Regulation of beta cells 


22 


Cell adhesion 


4 


Transmembrane transport 


23 


NOD like receptor 


5 


Myogenesis 


24 


Vasopressin 


6 


G alpha 13 protein 


25 


G alpha Q protein 


7 


HIVNEF pathway 


26 


PRARA 


8 


Steroid 


27 


Linoleic acid 


9 


Purine 


28 


Chemokine signaling 


10 


Oocyte meiosis 


29 


Neuroactive ligand receptor 


11 


Focal adhesion 


30 


Muscle contraction 


12 


Complement and Coagulation 


31 


Systemic lupus erythematosus 


13 


Olfactory transductoino 


32 


DNA Repair 


14 


Class A1 rhodopsin like receptors 


33 


IRS Related events 


15 


Host interaction of HIV factors 


34 


Post translational protein modification 


16 


Peptide ligand biding receptors 


35 


Arachidonic acid 


17 


Calcium signaling 


36 


ABC Transporters 


18 


Downstream TCR signaling 


37 


Phosphadylinositol signaling 


19 
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prefrontal cortex (MPFC), ventral striatum (VS), septal 
region (SE) and stria terminalis (ST). The number of 
defeated and control mice in different regions and con- 
ditions are summarized in Table 6. 

RNA isolation and quality assessment 

Total RNA was isolated according to the Trizol® method 
(Invitrogen Inc., Grand Island, NY) from homogenized 
whole blood and brain regions. RNA from blood was 
isolated using the PreAnalytiX PAXgene® blood RNA kit 
(Qiagen Inc., Valencia, CA). We collected the seven 
organ tissues from 5-6 control and defeat mice, respect- 
ively. We evaluated RNA integrity using the Agilent 
Bioanalyzer and excluded samples of low quality, which 
appears to either have low total RNA, or low ribosomal 
RNA (rRNA) mass ratio between 28S and 18S rRNA 
and high amount of non-ribosomal RNAs in the 
electropherograms. 

Microarray hybridization 

Microarray assays were performed using Agilent's gen- 
ome wide mouse expression array (GE 4x44K v2 two 
color microarray) slides and kits (Agilent Technologies 
Inc., Santa Clara, CA) following the manufacturers 
protocol. To minimize batch effects, each sample was 
hybridized with a universal common reference that was 
used for all experiments. Hybridized microarray slides 
were scanned using Agilent Technologies Scanner 
G2505C US09493743. 

Microarray data processing 

Genespring with feature extraction 10.x (Agilent, CA) was 
used to process all two-color chips. Log2 transformation, 
Lowess normalization, and quantile normalization were 
applied to normalize within and between microarrays. For 
the latter, we applied quantile normalization separately on 
data from each tissue. Outlier spots were converted to 

Table 6 Defeated and control mice (in the form of 
(number of defeated) / (number of control)) in different 
regions and conditions 



Condition 
Tissue 


5D-1D 


5D-10D 


10D-1D 


10D42D 


Blood 


(5) / (5) 


(5) / (5) 


(5) / (4) 


(5) / (5) 


HB (Hemibrain) 


5) / (5) 


(5) / (5) 


(5) / (5) 


(5) / (5) 


AY (Amygdala) 


(2) / (3) 


(4) / (5) 


(4) / (3) 


(4) / (3) 


HC (Hippocampus) 


(4) / (3) 


(6) / (4) 


(5) / (6) 


(5) / (5) 


MPFC (Medial Prefrontal Cortex) 


(5) / (4) 


(5) / (5) 


(4) / (3) 


(4) / (4) 


SE (Septal Region) 


(2) / (3) 


(3) / (2) 


(3) / (3) 


(3) / (3) 


ST (Stria Terminalis) 


(5) / (5) 


(5) / (5) 


(4) / (4) 


(5) / (4) 


VS (Ventral Striatum) 


(5) / (5) 


(5) / (5) 


(4) / (4) 


(4) / (5) 



missing values. If more than half of the expression values 
of a probe were missing, we removed the probe from con- 
sideration. We then imputed missing values using the k- 
nearest neighbor imputation method. To avoid incurring a 
bias in favor of genes represented by a greater number of 
probes, we aggregated multiple probes from the same 
Entrez Gene together by computing the mean of the "sib- 
ling" probes. We have deposited all microarray data for 
this study at the Gene Expression Omnibus (GEO): 

http://www.ncbi. nlm.nih.gov/geo/query/acc.cgi?acc= 
GSE45035. 

Linear model 

We used a linear model-based approach to deconvolve 
the experimental time effects from the social defeat ex- 
pression data. Assuming log-additive effects, our method 
estimates the contributions of each of the four experi- 
mental time points and subtracts them away from the 
remaining effect of social defeat. The linear model we 
used to deconvolve the experimental time effect is de- 
fined as follows: 



A" 




"1 


1 


0 


0 


0 


Ci 




0 


1 


0 


0 


0 






1 


0 


1 


0 


0 


C2 




0 


0 


1 


0 


0 


D3 




1 


0 


0 


1 


0 


C3 




0 


0 


0 


1 


0 






1 


0 


0 


0 


1 


C4 




0 


0 


0 


0 


1 



^defeat 



(1) 



(5D-1D/10D: 5 day treatment, 1 day/10 day recovery, 10D - 1 D/ 6 W: 10 day 
treatment, 1 day/6 week recovery). 



where Di and Q / = 1, 4 denote log2 gene expression 
values of defeated and control mice in condition /, a^efeat 
denotes the overall effect of social defeat and ^1, ^4 
are the undesired time effects. In practice, we solve the 
above over determined system for each gene separately 
using least squares (implemented in the limma package), 
carrying forward only the gene-specific defeat effect for 
subsequent analyses. 

Differential expression analysis 

As described above in Section 2, we used the R/Biocon- 
ductor limma package and iterative Group Analysis 
(iGA) method for differentially expressed gene and GO 
term/pathway identification, respectively. 

COMBINER 

As shown in Figure 4, the COMBINER method first 
infers the statistically discriminative modules from an 
inference dataset, then validates them in various valid- 
ation sets using consensus feature elimination. If a vali- 
dated final module is co-expressed in at least half of the 
validation sets, then it is defined as a core module. 
Finally, we project these core modules onto known PPI 
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COMBINER 



Consensus Feature Elimination 



Inferred 
modules 



Candidate 



500 random 



500 classifiers 




Figure 4 Schematic overview of COMBINER. COMBINER first infers candidate modules as activity vectors from each pathway in an inference 
dataset. It then validates these modules in validation datasets by regenerating activity vectors and performing supervised classification. Finally, the 
modules present in at least half of the validation sets are considered to be core modules. The resulting core module markers are then projected 
onto a known protein-protein interaction network. We generated 250 groups of 500 classifiers in parallel using IDA with recursive feature 
elimination. Both the classifier AUC and weight vectors were computed, and each feature was then ranked by its average normalized weight. The 
most consistently low-ranking feature was then removed recursively until the average AUC threshold was achieved. At this point, the remaining 
markers were considered to comprise the final modules. 



networks. To remove features, we generated 250 groups 
of 500 classifiers in parallel and applied Linear Discrimin- 
ant Analysis (LDA) with recursive feature elimination [52] 
to each to compute AUCs as well as weight vectors. Each 
feature was then ranked by its average normalized weight. 
The most consistently low-ranking feature was then re- 
moved recursively until the average AUC threshold of 0.75 
was achieved. At this point, the remaining features were 
considered to comprise the final modules. 

In our previous work [28], we applied both the 
Condition Responsive Genes (CORG) [53] and Core 
Module Inference (CMI) [28] methods to infer can- 
didate modules and express them as pathway activ- 
ities (PAs). In the greedy search process, CORG 
picks up either up- or down-regulated genes, while 
CMI identifies genes of both directions together. 
However, because of the multiple-time-point nature 
of the social defeat data, the application of the CMI 
method is not straightforward. Thus, in this work we 
used only the CORG method with the procedure de- 
scribed as follows. For a given pathway, we first rank 
the standardized gene expressions by their limma 
moderated t-score. If up-regulated genes are domin- 
ant, we rank the t-score in descending order; other- 
wise, ascending order is chosen. Next, we aggregate 
the first two genes using the formula y= (^i+^2)/\/2; 
if the expression of this aggregate yields a larger ab- 
solute t-score than the first gene, this combination is 
retained as a module with the combined expression 
becoming the PA. Otherwise, the procedure further 
adds the third gene using y = (xi +^2 +x3)/\/3, and so 



on, until the module-size limit, 25 genes, is reached. 
Finally, we ranked all modules using the absolute 
value of the pathway activity t-score. 

We faced two major challenges when modifying our 
COMBINER method. First, the multiple-time-point na- 
ture of the data initially decreased the binary classifi- 
cation performance of the static LDA classifier [52]. 
Second, the small data sample size leads to a large vari- 
ability of feature ranks after recursive feature elimin- 
ation. To cope with the first challenge, we used a linear 
model to deconvolve the time effects from the original 
expression values. We solved the second problem by im- 
proving our method for consensus feature elimination. 
We generated 250 groups of 500 classifiers in parallel, 
then removed the bottom feature using the voting 
principle. In general, using additional groups of classi- 
fiers will further improve the reproducibility of the final 
modules. In our experience 250 groups were sufficient 
to yield a reproducible result (results not shown). Finally, 
we used a fixed average AUC threshold to determine the 
final modules instead of the max average AUC threshold 
described in [28]. This was required since the inference 
and validation sets can be very dissimilar, which leads to 
low values of the max average AUC. 

We obtained pathway information from the MSigDB 
v3.0 Canonical Pathways subset [54]. To decrease redun- 
dancy, we applied pathway filtering to remove bulky path- 
ways. This resulted in a pathway dataset containing 791 
pathways with 5,633 genes assayed in all regions. The 
protein-protein interaction information was obtained from 
String v9.0 [55]. 
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Average time curve 

Let yij be the relative expression level of gene / at the 
sampling time The time-point expression patterns 
were modelled as follows, 

yij=f^i{tj)+eij (2) 

where l^i{tj) = log2(|^^(^;)/^^(^;) |) is the population 
average time curve for gene / evaluated at time tj and 
where 8ij is the random deviation from this curve, {tj) 
and^^(^y) are average expressions of disease and control 
mice respectively for gene / at time tj^ 

Software 

COMBINER was implemented in Matiab R2010a with Bio- 
informatics toolbox v3.5 (Math Works Inc., Natick, MA), 
statconn (http://www.statconn.com/), LinkR (http://www. 
mathworks.com/matlabcentral/fileexchange/5051), and R 
[56]. The source code can be found in Additional files 6 
and 7. 

Availability of supporting data 

The supporting data is provided in the supporting 
information. 
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